A Robust Supervised Variable Selection for Noisy High-Dimensional Data
نویسندگان
چکیده
The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملSemi-supervised feature learning for hyperspectral image classification
Hyperspectral image has high-dimensional Spectral–spatial features, those features with some noisy and redundant information. Since redundant features can have significant adverse effect on learning performance. So efficient and robust feature selection methods are make the best of labeled and unlabeled points to extract meaningful features and eliminate noisy ones. On the other hand, obtaining...
متن کاملSupervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملA Nonlinear Mixture Model based Unsupervised Variable Selection in Genomics and Proteomics
Typical scenarios occurring in genomics and proteomics involve small number of samples and large number of variables. Thus, variable selection is necessary for creating disease prediction models robust to overfitting. We propose an unsupervised variable selection method based on sparseness constrained decomposition of a sample. Decomposition is based on nonlinear mixture model comprised of test...
متن کاملRobust Unsupervised Feature Selection on Networked Data
Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
دوره 2015 شماره
صفحات -
تاریخ انتشار 2015